{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b0e12d13",
   "metadata": {},
   "source": [
    "# Data Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "147f934d",
   "metadata": {},
   "source": [
    "To fully utilize the power of HftBacktest, it requires to input Tick-by-Tick full order book and trade feed data. Unfortunately, free Tick-by-Tick full order book and trade feed data for HFT is not available unlike daily bar data provided by platforms like Yahoo Finance. However, in the case of cryptocurrency, you can collect the full raw feed yourself."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d73f6b70",
   "metadata": {},
   "source": [
    "## Getting started from Binance Futures' raw feed data\n",
    "\n",
    "You can collect Binance Futures feed yourself using [Data Collector](https://github.com/nkaz001/hftbacktest/tree/master/collector)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4694ac12",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'1723161255030314667 {\"stream\":\"btcusdt@depth@0ms\",\"data\":{\"e\":\"depthUpdate\",\"E\":1723161256299,\"T\":1723161256298,\"s\":\"BTCUSDT\",\"U\":5123107832006,\"u\":5123107837557,\"pu\":5123107831937,\"b\":[[\"58710.20\",\"0.014\"],[\"61496.50\",\"0.010\"],[\"61510.90\",\"0.000\"],[\"61641.50\",\"1.211\"],[\"61652.80\",\"0.195\"],[\"61653.30\",\"0.072\"],[\"61653.70\",\"0.067\"],[\"61657.90\",\"0.067\"],[\"61668.50\",\"0.086\"],[\"61670.60\",\"0.161\"],[\"61672.50\",\"0.821\"],[\"61673.60\",\"0.048\"],[\"61675.60\",\"0.050\"],[\"61684.50\",\"0.765\"],[\"61686.20\",\"0.008\"],[\"61701.80\",\"0.331\"],[\"61703.10\",\"0.238\"],[\"61715.90\",\"0.308\"],[\"61721.60\",\"0.235\"],[\"61724.10\",\"0.002\"],[\"61737.00\",\"0.015\"],[\"61739.00\",\"0.000\"],[\"61740.10\",\"0.008\"],[\"61740.50\",\"12.111\"],[\"61756.90\",\"0.550\"],[\"61758.70\",\"0.003\"],[\"61763.20\",\"0.014\"],[\"61764.10\",\"0.168\"],[\"61764.30\",\"0.000\"],[\"61765.50\",\"0.000\"],[\"61767.40\",\"0.004\"],[\"61768.20\",\"0.120\"],[\"61768.60\",\"0.020\"],[\"61768.90\",\"0.099\"],[\"61770.80\",\"0.049\"],[\"61771.10\",\"0.612\"],[\"61771.70\",\"0.010\"],[\"61773.50\",\"0.035\"],[\"61773.80\",\"0.025\"],[\"61774.00\",\"0.112\"],[\"61775.60\",\"0.010\"],[\"61776.00\",\"0.084\"],[\"61778.30\",\"0.000\"],[\"61778.60\",\"0.408\"],[\"61779.30\",\"0.020\"],[\"61779.60\",\"0.220\"],[\"61783.80\",\"0.002\"],[\"61784.90\",\"0.102\"],[\"61785.00\",\"0.000\"],[\"61788.10\",\"0.140\"],[\"61789.50\",\"0.000\"],[\"61798.70\",\"0.153\"],[\"61800.20\",\"2.507\"]],\"a\":[[\"61800.30\",\"3.330\"],[\"61804.60\",\"0.057\"],[\"61810.00\",\"0.285\"],[\"61812.00\",\"0.732\"],[\"61814.90\",\"0.000\"],[\"61817.20\",\"0.000\"],[\"61818.70\",\"0.040\"],[\"61824.00\",\"0.860\"],[\"61829.10\",\"0.185\"],[\"61831.30\",\"0.008\"],[\"61831.40\",\"0.501\"],[\"61839.00\",\"0.002\"],[\"61840.00\",\"0.192\"],[\"61856.30\",\"0.003\"],[\"61857.10\",\"0.027\"],[\"61857.40\",\"0.000\"],[\"61858.80\",\"0.005\"],[\"61858.90\",\"0.032\"],[\"61859.60\",\"0.034\"],[\"61874.80\",\"0.006\"],[\"61893.40\",\"0.335\"],[\"61911.90\",\"0.014\"],[\"61925.90\",\"0.000\"],[\"61930.50\",\"0.015\"],[\"61945.10\",\"0.000\"],[\"61953.70\",\"0.000\"],[\"62144.00\",\"0.006\"],[\"63113.70\",\"0.000\"],[\"65880.70\",\"15.918\"]]}}\\n'\n",
      "b'1723161255088169167 {\"stream\":\"btcusdt@bookTicker\",\"data\":{\"e\":\"bookTicker\",\"u\":5123107839020,\"s\":\"BTCUSDT\",\"b\":\"61800.20\",\"B\":\"2.507\",\"a\":\"61800.30\",\"A\":\"2.510\",\"T\":1723161256313,\"E\":1723161256313}}\\n'\n",
      "b'1723161255088176367 {\"stream\":\"btcusdt@trade\",\"data\":{\"e\":\"trade\",\"E\":1723161256322,\"T\":1723161256322,\"s\":\"BTCUSDT\",\"t\":5266583935,\"p\":\"61800.30\",\"q\":\"0.006\",\"X\":\"MARKET\",\"m\":false}}\\n'\n",
      "b'1723161255088181667 {\"stream\":\"btcusdt@bookTicker\",\"data\":{\"e\":\"bookTicker\",\"u\":5123107840008,\"s\":\"BTCUSDT\",\"b\":\"61800.20\",\"B\":\"2.507\",\"a\":\"61800.30\",\"A\":\"2.504\",\"T\":1723161256322,\"E\":1723161256322}}\\n'\n",
      "b'1723161255088182467 {\"stream\":\"btcusdt@bookTicker\",\"data\":{\"e\":\"bookTicker\",\"u\":5123107840016,\"s\":\"BTCUSDT\",\"b\":\"61800.20\",\"B\":\"2.507\",\"a\":\"61800.30\",\"A\":\"2.522\",\"T\":1723161256322,\"E\":1723161256322}}\\n'\n"
     ]
    }
   ],
   "source": [
    "import gzip\n",
    "\n",
    "with gzip.open('usdm/btcusdt_20240808.gz', 'r') as f:\n",
    "    for i in range(5):\n",
    "        line = f.readline()\n",
    "        print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f27f86a",
   "metadata": {},
   "source": [
    "The first token of the line is timestamp received by local.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "**Note:** The timestamp is in nanoseconds.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ce42cbb",
   "metadata": {},
   "source": [
    "The data needs to be converted to normalized data that can be fed into HftBacktest.  \n",
    "`convert` method also attempts to correct timestamps by reordering the rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e72631fd-93a2-4b1c-a753-9534511d6563",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Correcting the latency\n",
      "local_timestamp is ahead of exch_timestamp by 1272156851\n",
      "Correcting the event order\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "from hftbacktest.data.utils import binancefutures\n",
    "\n",
    "data = binancefutures.convert(\n",
    "    'usdm/btcusdt_20240808.gz',\n",
    "    combined_stream=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c1a2a62",
   "metadata": {},
   "source": [
    "Normalized data as follows. You can find more details on [Data](https://hftbacktest.readthedocs.io/en/latest/data.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "75f3daa2-f158-4c49-9e4b-1bf175ac0c7e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (491_973, 8)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ev</th><th>exch_ts</th><th>local_ts</th><th>px</th><th>qty</th><th>order_id</th><th>ival</th><th>fval</th></tr><tr><td>u64</td><td>i64</td><td>i64</td><td>f64</td><td>f64</td><td>u64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>3758096385</td><td>1723161256298000000</td><td>1723161256302471518</td><td>58710.2</td><td>0.014</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161256298000000</td><td>1723161256302471518</td><td>61496.5</td><td>0.01</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161256298000000</td><td>1723161256302471518</td><td>61510.9</td><td>0.0</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161256298000000</td><td>1723161256302471518</td><td>61641.5</td><td>1.211</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161256298000000</td><td>1723161256302471518</td><td>61652.8</td><td>0.195</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>3489660929</td><td>1723161600030000000</td><td>1723161600043617932</td><td>62292.9</td><td>0.0</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161600319000000</td><td>1723161600370793433</td><td>5000.0</td><td>2.321</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660929</td><td>1723161600709000000</td><td>1723161600760777134</td><td>61659.8</td><td>0.981</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161601054000000</td><td>1723161601105649435</td><td>61631.7</td><td>0.283</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1723161601054000000</td><td>1723161601105649435</td><td>61632.6</td><td>0.0</td><td>0</td><td>0</td><td>0.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (491_973, 8)\n",
       "┌────────────┬─────────────────────┬────────────────────┬─────────┬───────┬──────────┬──────┬──────┐\n",
       "│ ev         ┆ exch_ts             ┆ local_ts           ┆ px      ┆ qty   ┆ order_id ┆ ival ┆ fval │\n",
       "│ ---        ┆ ---                 ┆ ---                ┆ ---     ┆ ---   ┆ ---      ┆ ---  ┆ ---  │\n",
       "│ u64        ┆ i64                 ┆ i64                ┆ f64     ┆ f64   ┆ u64      ┆ i64  ┆ f64  │\n",
       "╞════════════╪═════════════════════╪════════════════════╪═════════╪═══════╪══════════╪══════╪══════╡\n",
       "│ 3758096385 ┆ 1723161256298000000 ┆ 172316125630247151 ┆ 58710.2 ┆ 0.014 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 8                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161256298000000 ┆ 172316125630247151 ┆ 61496.5 ┆ 0.01  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 8                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161256298000000 ┆ 172316125630247151 ┆ 61510.9 ┆ 0.0   ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 8                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161256298000000 ┆ 172316125630247151 ┆ 61641.5 ┆ 1.211 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 8                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161256298000000 ┆ 172316125630247151 ┆ 61652.8 ┆ 0.195 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 8                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ …          ┆ …                   ┆ …                  ┆ …       ┆ …     ┆ …        ┆ …    ┆ …    │\n",
       "│ 3489660929 ┆ 1723161600030000000 ┆ 172316160004361793 ┆ 62292.9 ┆ 0.0   ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 2                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161600319000000 ┆ 172316160037079343 ┆ 5000.0  ┆ 2.321 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 3                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3489660929 ┆ 1723161600709000000 ┆ 172316160076077713 ┆ 61659.8 ┆ 0.981 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 4                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161601054000000 ┆ 172316160110564943 ┆ 61631.7 ┆ 0.283 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 5                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1723161601054000000 ┆ 172316160110564943 ┆ 61632.6 ┆ 0.0   ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 5                  ┆         ┆       ┆          ┆      ┆      │\n",
       "└────────────┴─────────────────────┴────────────────────┴─────────┴───────┴──────────┴──────┴──────┘"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "pl.DataFrame(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5ff53f1",
   "metadata": {},
   "source": [
    "You can save the data directly to a file by providing `output_filename`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d8b8fd5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Correcting the latency\n",
      "local_timestamp is ahead of exch_timestamp by 1272156851\n",
      "Correcting the event order\n",
      "Saving to usdm/btcusdt_20240808.npz\n"
     ]
    }
   ],
   "source": [
    "_ = binancefutures.convert(\n",
    "    'usdm/btcusdt_20240808.gz',\n",
    "    output_filename='usdm/btcusdt_20240808.npz',\n",
    "    combined_stream=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "369d5f17",
   "metadata": {},
   "source": [
    "## Creating a market depth snapshot"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04f3f3dd",
   "metadata": {},
   "source": [
    "As Binance Futures exchange runs 24/7, you need the initial snapshot to get the complete(almost) market depth.  \n",
    "[Data Collector](https://github.com/nkaz001/hftbacktest/tree/master/collector) fetches the snapshot only when it makes the connection, so you need build the initial snapshot from the start of the collected feed data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3045cb73-ee69-4ef6-89d4-ae754b7132e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from hftbacktest.data.utils.snapshot import create_last_snapshot\n",
    "\n",
    "# Builds 20240808 End of Day snapshot. It will be used for the initial snapshot for 20240809.\n",
    "data = create_last_snapshot(\n",
    "    ['usdm/btcusdt_20240808.npz'],\n",
    "    tick_size=0.1,\n",
    "    lot_size=0.001\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abc51f5f-f568-435d-94a0-3dead470508c",
   "metadata": {},
   "source": [
    "Bid levels are shown before ask levels in the snapshot, and levels are sorted from the best price to the farthest price."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cc2abfae-8868-4c25-920e-5df1fea0cf53",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (9_597, 8)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ev</th><th>exch_ts</th><th>local_ts</th><th>px</th><th>qty</th><th>order_id</th><th>ival</th><th>fval</th></tr><tr><td>u64</td><td>i64</td><td>i64</td><td>f64</td><td>f64</td><td>u64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>3758096388</td><td>0</td><td>0</td><td>61659.7</td><td>1.486</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096388</td><td>0</td><td>0</td><td>61659.0</td><td>0.002</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096388</td><td>0</td><td>0</td><td>61658.1</td><td>0.033</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096388</td><td>0</td><td>0</td><td>61658.0</td><td>6.718</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096388</td><td>0</td><td>0</td><td>61657.9</td><td>0.007</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>3489660932</td><td>0</td><td>0</td><td>77354.3</td><td>0.015</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660932</td><td>0</td><td>0</td><td>77905.9</td><td>0.003</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660932</td><td>0</td><td>0</td><td>80000.0</td><td>10.708</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660932</td><td>0</td><td>0</td><td>104765.0</td><td>0.034</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660932</td><td>0</td><td>0</td><td>617050.0</td><td>0.003</td><td>0</td><td>0</td><td>0.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (9_597, 8)\n",
       "┌────────────┬─────────┬──────────┬──────────┬────────┬──────────┬──────┬──────┐\n",
       "│ ev         ┆ exch_ts ┆ local_ts ┆ px       ┆ qty    ┆ order_id ┆ ival ┆ fval │\n",
       "│ ---        ┆ ---     ┆ ---      ┆ ---      ┆ ---    ┆ ---      ┆ ---  ┆ ---  │\n",
       "│ u64        ┆ i64     ┆ i64      ┆ f64      ┆ f64    ┆ u64      ┆ i64  ┆ f64  │\n",
       "╞════════════╪═════════╪══════════╪══════════╪════════╪══════════╪══════╪══════╡\n",
       "│ 3758096388 ┆ 0       ┆ 0        ┆ 61659.7  ┆ 1.486  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3758096388 ┆ 0       ┆ 0        ┆ 61659.0  ┆ 0.002  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3758096388 ┆ 0       ┆ 0        ┆ 61658.1  ┆ 0.033  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3758096388 ┆ 0       ┆ 0        ┆ 61658.0  ┆ 6.718  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3758096388 ┆ 0       ┆ 0        ┆ 61657.9  ┆ 0.007  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ …          ┆ …       ┆ …        ┆ …        ┆ …      ┆ …        ┆ …    ┆ …    │\n",
       "│ 3489660932 ┆ 0       ┆ 0        ┆ 77354.3  ┆ 0.015  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3489660932 ┆ 0       ┆ 0        ┆ 77905.9  ┆ 0.003  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3489660932 ┆ 0       ┆ 0        ┆ 80000.0  ┆ 10.708 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3489660932 ┆ 0       ┆ 0        ┆ 104765.0 ┆ 0.034  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│ 3489660932 ┆ 0       ┆ 0        ┆ 617050.0 ┆ 0.003  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "└────────────┴─────────┴──────────┴──────────┴────────┴──────────┴──────┴──────┘"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pl.DataFrame(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2ba2557f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from hftbacktest.data.utils.snapshot import create_last_snapshot\n",
    "\n",
    "# Builds 20240808 End of Day snapshot. It will be used for the initial snapshot for 20240809.\n",
    "_ = create_last_snapshot(\n",
    "    ['usdm/btcusdt_20240808.npz'],\n",
    "    tick_size=0.1,\n",
    "    lot_size=0.001,\n",
    "    output_snapshot_filename='usdm/btcusdt_20240808_eod.npz'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7a74be64-cf31-4c3c-b3df-f7829b4178ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Correcting the latency\n",
      "local_timestamp is ahead of exch_timestamp by 1273873720\n",
      "Correcting the event order\n",
      "Saving to usdm/btcusdt_20240809.npz\n"
     ]
    }
   ],
   "source": [
    "# Converts 20240809 data.\n",
    "_ = binancefutures.convert(\n",
    "    'usdm/btcusdt_20240809.gz',\n",
    "    output_filename='usdm/btcusdt_20240809.npz',\n",
    "    combined_stream=True\n",
    ")\n",
    "\n",
    "# Builds 20240809's last snapshot.\n",
    "# Due to the file size limitation of GitHub, btcusdt_20240809.npz does not contain data for the entire day.\n",
    "_ = create_last_snapshot(\n",
    "    ['usdm/btcusdt_20240809.npz'],\n",
    "    tick_size=0.1,\n",
    "    lot_size=0.001,\n",
    "    output_snapshot_filename='usdm/btcusdt_20240809_last.npz',\n",
    "    initial_snapshot='usdm/btcusdt_20240808_eod.npz',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9b213182-b619-46be-a39e-eec99de7b988",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Builds 20240809's last snapshot without the initial snapshot.\n",
    "_ = create_last_snapshot(\n",
    "    ['usdm/btcusdt_20240809.npz'],\n",
    "    tick_size=0.1,\n",
    "    lot_size=0.001,\n",
    "    output_snapshot_filename='usdm/btcusdt_20240809_last_wo_ss.npz'\n",
    ")\n",
    "\n",
    "# Builds the 20240809's last snapshot from 20240808 without the initial snapshot.\n",
    "_ = create_last_snapshot(\n",
    "    [\n",
    "        'usdm/btcusdt_20240808.npz',\n",
    "        'usdm/btcusdt_20240809.npz'\n",
    "    ],\n",
    "    tick_size=0.1,\n",
    "    lot_size=0.001,\n",
    "    output_snapshot_filename='usdm/btcusdt_20240809_last.npz'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63f79c14",
   "metadata": {},
   "source": [
    "## Getting started from Tardis.dev data\n",
    "\n",
    "Few vendors offer tick-by-tick full market depth data along with snapshot and trade data, and Tardis.dev is among them.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "**Note:** Some data may have an issue with the exchange timestamp. Ideally, the exchange timestamp should reflect the moment the event occurs at the matching engine. However, some data uses the server's data sent timestamp instead of the matching engine timestamp.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "016cbeb0-808a-4aaf-a18a-88e885161841",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-08-09 09:42:51--  https://datasets.tardis.dev/v1/binance-futures/trades/2020/02/01/BTCUSDT.csv.gz\n",
      "Resolving datasets.tardis.dev (datasets.tardis.dev)... 104.18.6.96, 104.18.7.96, 2606:4700::6812:760, ...\n",
      "Connecting to datasets.tardis.dev (datasets.tardis.dev)|104.18.6.96|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 3090479 (2.9M) [text/csv]\n",
      "Saving to: ‘BTCUSDT_trades.csv.gz’\n",
      "\n",
      "BTCUSDT_trades.csv. 100%[===================>]   2.95M  5.66MB/s    in 0.5s    \n",
      "\n",
      "2024-08-09 09:42:52 (5.66 MB/s) - ‘BTCUSDT_trades.csv.gz’ saved [3090479/3090479]\n",
      "\n",
      "--2024-08-09 09:42:52--  https://datasets.tardis.dev/v1/binance-futures/incremental_book_L2/2020/02/01/BTCUSDT.csv.gz\n",
      "Resolving datasets.tardis.dev (datasets.tardis.dev)... 104.18.7.96, 104.18.6.96, 2606:4700::6812:760, ...\n",
      "Connecting to datasets.tardis.dev (datasets.tardis.dev)|104.18.7.96|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 250016849 (238M) [text/csv]\n",
      "Saving to: ‘BTCUSDT_book.csv.gz’\n",
      "\n",
      "BTCUSDT_book.csv.gz 100%[===================>] 238.43M  9.93MB/s    in 23s     \n",
      "\n",
      "2024-08-09 09:43:16 (10.3 MB/s) - ‘BTCUSDT_book.csv.gz’ saved [250016849/250016849]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# https://docs.tardis.dev/historical-data-details/binance-futures\n",
    "\n",
    "# Downloads sample Binance futures BTCUSDT trades\n",
    "!wget https://datasets.tardis.dev/v1/binance-futures/trades/2020/02/01/BTCUSDT.csv.gz -O BTCUSDT_trades.csv.gz\n",
    "    \n",
    "# Downloads sample Binance futures BTCUSDT book\n",
    "!wget https://datasets.tardis.dev/v1/binance-futures/incremental_book_L2/2020/02/01/BTCUSDT.csv.gz -O BTCUSDT_book.csv.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d16bdd3-2843-457c-ac20-680b27b76692",
   "metadata": {},
   "source": [
    "It is recommended to input trade files before depth files. This is because if a depth event occurs due to a trade event, having the trade event before the depth event could provide a more realistic fill during backtesting. However, the sorting process will prioritize events from the first input file when both events have the same timestamp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2a94dc09",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reading BTCUSDT_trades.csv.gz\n",
      "Reading BTCUSDT_book.csv.gz\n",
      "Correcting the latency\n",
      "Correcting the event order\n"
     ]
    }
   ],
   "source": [
    "from hftbacktest.data.utils import tardis\n",
    "\n",
    "data = tardis.convert(\n",
    "    ['BTCUSDT_trades.csv.gz', 'BTCUSDT_book.csv.gz']\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4ee3977f-0333-49e8-9a25-f10c4f5fd856",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (27_532_602, 8)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ev</th><th>exch_ts</th><th>local_ts</th><th>px</th><th>qty</th><th>order_id</th><th>ival</th><th>fval</th></tr><tr><td>u64</td><td>i64</td><td>i64</td><td>f64</td><td>f64</td><td>u64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>3758096386</td><td>1580515202342000000</td><td>1580515202497052000</td><td>9364.51</td><td>1.197</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096386</td><td>1580515202342000000</td><td>1580515202497346000</td><td>9365.67</td><td>0.02</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096386</td><td>1580515202342000000</td><td>1580515202497352000</td><td>9365.86</td><td>0.01</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096386</td><td>1580515202342000000</td><td>1580515202497357000</td><td>9366.36</td><td>0.002</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096386</td><td>1580515202342000000</td><td>1580515202497363000</td><td>9366.36</td><td>0.003</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>3489660929</td><td>1580601599812000000</td><td>1580601599944404000</td><td>9397.79</td><td>0.0</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1580601599826000000</td><td>1580601599952176000</td><td>9354.8</td><td>4.07</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1580601599836000000</td><td>1580601599962961000</td><td>9351.47</td><td>3.914</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3489660929</td><td>1580601599836000000</td><td>1580601599963461000</td><td>9397.78</td><td>0.1</td><td>0</td><td>0</td><td>0.0</td></tr><tr><td>3758096385</td><td>1580601599848000000</td><td>1580601599973647000</td><td>9348.14</td><td>3.98</td><td>0</td><td>0</td><td>0.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (27_532_602, 8)\n",
       "┌────────────┬─────────────────────┬────────────────────┬─────────┬───────┬──────────┬──────┬──────┐\n",
       "│ ev         ┆ exch_ts             ┆ local_ts           ┆ px      ┆ qty   ┆ order_id ┆ ival ┆ fval │\n",
       "│ ---        ┆ ---                 ┆ ---                ┆ ---     ┆ ---   ┆ ---      ┆ ---  ┆ ---  │\n",
       "│ u64        ┆ i64                 ┆ i64                ┆ f64     ┆ f64   ┆ u64      ┆ i64  ┆ f64  │\n",
       "╞════════════╪═════════════════════╪════════════════════╪═════════╪═══════╪══════════╪══════╪══════╡\n",
       "│ 3758096386 ┆ 1580515202342000000 ┆ 158051520249705200 ┆ 9364.51 ┆ 1.197 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096386 ┆ 1580515202342000000 ┆ 158051520249734600 ┆ 9365.67 ┆ 0.02  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096386 ┆ 1580515202342000000 ┆ 158051520249735200 ┆ 9365.86 ┆ 0.01  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096386 ┆ 1580515202342000000 ┆ 158051520249735700 ┆ 9366.36 ┆ 0.002 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096386 ┆ 1580515202342000000 ┆ 158051520249736300 ┆ 9366.36 ┆ 0.003 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ …          ┆ …                   ┆ …                  ┆ …       ┆ …     ┆ …        ┆ …    ┆ …    │\n",
       "│ 3489660929 ┆ 1580601599812000000 ┆ 158060159994440400 ┆ 9397.79 ┆ 0.0   ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1580601599826000000 ┆ 158060159995217600 ┆ 9354.8  ┆ 4.07  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1580601599836000000 ┆ 158060159996296100 ┆ 9351.47 ┆ 3.914 ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3489660929 ┆ 1580601599836000000 ┆ 158060159996346100 ┆ 9397.78 ┆ 0.1   ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "│ 3758096385 ┆ 1580601599848000000 ┆ 158060159997364700 ┆ 9348.14 ┆ 3.98  ┆ 0        ┆ 0    ┆ 0.0  │\n",
       "│            ┆                     ┆ 0                  ┆         ┆       ┆          ┆      ┆      │\n",
       "└────────────┴─────────────────────┴────────────────────┴─────────┴───────┴──────────┴──────┴──────┘"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pl.DataFrame(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b863d7cb",
   "metadata": {},
   "source": [
    "You can save the data directly to a file by providing `output_filename`. If there are too many rows, you need to increase `buffer_size`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f2026ce5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reading BTCUSDT_trades.csv.gz\n",
      "Reading BTCUSDT_book.csv.gz\n",
      "Correcting the latency\n",
      "Correcting the event order\n",
      "Saving to btcusdt_20200201.npz\n"
     ]
    }
   ],
   "source": [
    "_ = tardis.convert(\n",
    "    ['BTCUSDT_trades.csv.gz', 'BTCUSDT_book.csv.gz'],\n",
    "    output_filename='btcusdt_20200201.npz',\n",
    "    buffer_size=200_000_000\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "736272d6",
   "metadata": {},
   "source": [
    "Tardis.dev artificially inserts the SOD snapshot to the start of the daily file. If you continuously backtest multiple days, you don't need the snapshot every start of days and it may incur more time to backtest. You can choose to include the Tardis.dev's SOD snapshot in the converted file using the option."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
